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Theory of Generalization 
Roadmap 
@ When Can Machines Learn? 
@ Why Can Machines Learn? 


Lecture 5: Training versus Testing 


effective price of choice in training: (wishfully) 
growth function my (М) with a break point 












Lecture 6: Theory of Generalization 


Restriction of Break Point 

Bounding Function: Basic Cases 
Bounding Function: Inductive Cases 
A Pictorial Proof 





Ө How Can Machines Learn? 
@ How Can Machines Learn Better? 
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Theory of Generalization Restriction of Break Point 


The Four Break Points 
growth function ту (№): max number of dichotomies | 


e positive rays: my(N) = N + 1 
ох  ma(2) = 3 < 22: break point at 2 
• positive intervals: my(N) = 3N? + iN 1 
охо  ma(3) = 7 < 23: break point at З 
* convex sets: ma (N) = 2N 
o | а ти(М) = 2“ always: no break point 
• 2D perceptrons: m,(N) < 2" in some cases 


x x  mq(4)- 14 < 2*: break point at 4 





break point К == break point к + 1, ... 
what else? | 
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Theory of Generalization Restriction of Break Point 


Restriction of Break Point (1/2) 


what ‘must be true’ when minimum break point k = 2 | 


e N = 1: every m4(N) = 2 by definition 
e N=2: every ту(М) < 4 by definition 
(so maximum possible = 3) 








maximum possible (№) when N = 3 and k = 2? 
1 dichotomy  , shatter any two points? no 
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Theory of Generalization Restriction of Break Point 


Restriction of Break Point (1/2) 


what ‘must be true’ when minimum break point k = 2 | 


e N = 1: every m4(N) = 2 by definition 
e N=2: every ту(М) < 4 by definition 
(so maximum possible = 3) 








maximum possible m,(N) when N = 3 and k = 2? 
2 dichotomies , shatter any two points? no 
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Theory of Generalization Restriction of Break Point 


Restriction of Break Point (1/2) 


what ‘must be true’ when minimum break point k = 2 | 


e N = 1: every m4(N) = 2 by definition 
e N=2: every ту(М) < 4 by definition 
(so maximum possible = 3) 







maximum possible m,(N) when N = 3 and k = 2? 
3 dichotomies , shatter any two points? no 
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Theory of Generalization Restriction of Break Point 


Restriction of Break Point (1/2) 


what ‘must be true’ when minimum break point k = 2 | 


e N = 1: every m4(N) = 2 by definition 
e N=2: every m,(N) < 4 by definition 
(so maximum possible = 3) 







maximum possible m,(N) when N = 3 and k = 2? 
4 dichotomies , shatter any two points? yes 
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Theory of Generalization Restriction of Break Point 


Restriction of Break Point (1/2) 
what ‘must be true’ when minimum break point k = 2 | 


e N = 1: every m4(N) = 2 by definition 
e N=2: every ту(М) < 4 by definition 
(so maximum possible = 3) 







maximum possible m,(N) when N = 3 and k = 2? 
4 dichotomies , shatter any two points? no 
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Theory of Generalization Restriction of Break Point 


Restriction of Break Point (1/2) 
what ‘must be true’ when minimum break point k = 2 | 


e N = 1: every m4(N) = 2 by definition 
e N=2: every m,(N) < 4 by definition 
(so maximum possible = 3) 







maximum possible m,(N) when N = 3 and k = 2? 
5 dichotomies , shatter any two points? yes 
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Theory of Generalization Restriction of Break Point 


Restriction of Break Point (1/2) 
what ‘must be true’ when minimum break point k = 2 | 


e N = 1: every m4(N) = 2 by definition 
e N=2: every m,(N) < 4 by definition 
(so maximum possible = 3) 







maximum possible (№) when N = 3 and k = 2? 
5 dichotomies , shatter any two points? yes 
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Theory of Generalization Restriction of Break Point 


Restriction of Break Point (1/2) 
what ‘must be true’ when minimum break point k = 2 | 


e N = 1: every m4(N) = 2 by definition 
e N=2: every m,(N) < 4 by definition 
(so maximum possible = 3) 







maximum possible (№) when N = 3 and k = 2? 
5 dichotomies , shatter any two points? yes 





Хү X2 X3 
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Theory of Generalization Restriction of Break Point 


Restriction of Break Point (1/2) 
what ‘must be true’ when minimum break point k = 2 | 


e N = 1: every m4(N) = 2 by definition 
e N=2: every ту(М) < 4 by definition 
(so maximum possible = 3) 







maximum possible m,(N) when N = 3 and k = 2? 
maximum possible so far: 4 dichotomies 
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Theory of Generalization Restriction of Break Point 


Restriction of Break Point (2/2) 
what ‘must be true’ when minimum break point k = 2 | 


e N = 1: every m4(N) = 2 by definition 

e N=2: every ти(М) < 4 by definition 
(so maximum possible = 3) 

• М = 3: maximum possible = 4 < 23 








—break point К restricts maximum possible m;,(N) а lot for N > К 


idea: тн (№) 
maximum possible m,(N) given К 
poly(N) 


= 
< 





Theory of Generalization Restriction of Break Point 


Fun Time 


When minimum break point k = 1, what is the maximum 
possible m,(N) when N = 3? 













Reference Answer: a Хү X2 X3 
о Ж о 
Because К = 1, the hypothesis set cannot even 


shatter one point. Thus, every ‘column’ of the 
table cannot contain both o and x. Then, after 
including the first dichotomy, it is not possible 
to include any other different dichotomy. Thus, 
the maximum possible ту (М) is 1. 
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Theory of Generalization Bounding Function: Basic Cases 


Bounding Function 


bounding function B(N, К): 
maximum possible m,(N) when break point = k 


e combinatorial quantity: 
maximum number of length-N vectors with (o, x) 
while ‘no shatter’ any length-k subvectors 

e irrelevant of the details of H 
e.g. B(N,3) bounds both 

e positive intervals (k = 3) 

e 1D perceptrons (k = 3) 





new goal: В(М, к) < poly(N)? | 
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Theory of Generalization Bounding Function: Basic Cases 


Table of Bounding Function (1/4) 


B(N,k) | 1 2 cg + Ue 
























• 8(2,2) = 
• 8(3,2) = 


= 3 (maximum < 4) 
4 (‘pictorial’ proof previously) 
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Theory of Generalization Bounding Function: Basic Cases 


Table of Bounding Function (2/4) 


B(N, К) 2130-455 6 











e B(N,1) =1 (see previous quiz) 
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Theory of Generalization Bounding Function: Basic Cases 


Table of Bounding Function (3/4) 
k 


8 



















e B(N,k) = 2% for N < к 
—including all dichotomies not violating ‘breaking condition’ 
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Theory of Generalization Bounding Function: Basic Cases 


Table of Bounding Function (4/4) 


8 



















e B(N,k) = 2" —1 for N=k 
—removing a single dichotomy satisfies ‘breaking condition’ 


more than halfway done! :-) | 
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Theory of Generalization Bounding Function: Basic Cases 


Fun Time 


For the 2D perceptrons, which of the following claim is true? 
Ө minimum break point k = 2 

Ө т;(4)- 15 

Ө ™m,(N) < B(N, k) when N = К = minimum break point 

Ө ™m,(N) > В(М,К) when N = К = minimum break point 










Reference Answer: 9 


As discussed previously, minimum break point 
for 2D perceptrons is 4, with m;,(4) = 14. Also, 
note that B(4, 4) — 15. So bounding function 
B(N, К) can be ‘loose’ in bounding ms, (№). 
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Theory of Generalization Bounding Function: Inductive Cases 
Estimating B(4, 3) 
k 















е B(4,3) shall be related to B(3, 2) 
—‘adding’ one point from B(3, ?) 


next: reduce B(4,3) to B(3, ?) | 
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Theory of Generalization Bounding Function: Inductive Cases 


‘Achieving’ Dichotomies of B(4, 3) 


after checking all 22" sets of dichotomies, the winner is ... | 














ХІ Хо Хз X4 
01 б 9 о 
| % е 6 6 
Oe || о ж o б 
QM c е $$ € 
05|е ә вә хх 
008 % қ е ~ 
072 ИИИ 
Os || << oe 9 x 
09 | ә 3$ % € 
Оо “@ хх 
11 OG e. БО 











how to reduce В(4,3) to B(3, 7) cases? | 
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Theory of Generalization Bounding Function: Inductive Cases 


Reorganized Dichotomies of B(4, 3) 


after checking all 22" sets of dichotomies, the winner is ... | 














ХІ Хо Хз X4 
01 б 9 о 
| % е 6 6 
Oe || о ж o б 
0%. |е е «& 
05|е ә вә < 
008 % қ е ~ 
072 | “ 9 %< 0 
Os || << ә 9 x 
Og) || ә с % € 
ПО КЕ 
11 OMNE OE 507% 











orange: pair; purple: single | 
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Theory of Generalization Bounding Function: Inductive Cases 


Estimating Part of 8(4, 3) (1/2) 
B(4,3) = 11 = 20 + 8 | 





PEI o 





• о + B: dichotomies on (X1, X2, X3) 
* B(4,3) ‘no shatter’ any 3 inputs 
== o + ‘no shatter’ any З 


a+ 8 € B(3,3) | 














Theory of Generalization Bounding Function: Inductive Cases 


Estimating Part of В(4, 3) (2/2) 


B(4,3) = 11 = 204 8 | 


х2 






Хз 


е o: dichotomies on (X1, X2, Хз) 
with x4 paired 

e B(4,3) ‘no shatter’ any 3 inputs 

== o ‘no shatter’ any 2 

















а € B(8, 2) | 
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Theory of Generalization Bounding Function: Inductive Cases 


Putting It All Together 






В(4,3) = 2a+8 
«+8 = B(3,3) 
а= B(3,2) 

= В(4,3) < B(3,3)4 B(3,2) 
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now have upper bound of bounding function | 
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Theory of Generalization Bounding Function: Inductive Cases 


Putting It All Together 
В(М,К) = 20+8 








а+8 < ВБ(М-1,К) 
2212 
= В(МК) < B(N—1,k) + B(N—1,k — 1) 
k 
B(N,k) | 1 2 3 4 5 6 
1 1 2 2 2 2 2 
2. [1 3 4 4 4 4 
3 1 4 7 8 8 8 
N 4 1 ЖООБ 11 15 16 16 
5 1 = 6 = 6 - 26 34 32 
6 1 CUM = 22 < 42 = ӨТ 63 








now have upper bound of bounding function | 


Theory of Generalization Bounding Function: Inductive Cases 


Bounding Function: The Theorem 


к-1 


В(М, К) < 2. (7) 


кос 2 
highest term МК—1 






e simple induction using boundary and inductive formula 


e for fixed К, B(N, К) upper bounded by poly(N) 
== m(N) is poly(N) if break point exists 





‘<’ can be ‘=’ actually, 
go play and prove it if math lover! :-) 





Theory of Generalization Bounding Function: Inductive Cases 


The Three Break Points 


k—1 N 
« 
B(N,k) < 2. ie 
қалғыған; 
highest term Nk-1 






* positive rays: mu(N)=N+1<N+1 
ox  M,(2) = 3 < 22: break point at 2 

e positive intervals: my(N) = 1N? + N +1< 3N?+3N+1 
охо  mq(3) = 7 < 23: break point at 3 

* 2D perceptrons: my (N)—? < ШЕ га SN+1 


о 


x x ти(4)-14<24 break point at 4 





can bound ту (№) by only one break point | 
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Theory of Generalization Bounding Function: Inductive Cases 


Fun Time 


For 1D perceptrons (positive and negative rays), we know that 


m,(N) = 2N. Let k be the minimum break point. Which of the 
following is not true? 
© к-з 
Ө for some integers N > 0, my(N) = 35 (9) 
Ө for all integers N > 0, my(N) = У (7) 


Ө for all integers М > 2, ту(М) < У (7) 











Reference Answer: Ө 
The proof is generally trivial by listing the 
definitions. For О) N = 1 or 2 gives the 


equality. One thing to notice is (4): the upper 
bound can be ‘loose’. 
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Theory of Generalization A Pictorial Proof 


BAD Bound for General H 


want: 


P|3h EH st. |Ein(h) – Eour(h)| > e «2 ma( М)-ехр (-2 ew) 





actually, when N large enough, 





P [3h EHX s.t. | Ei (^) - Eou(h)| > 1 < 2-2m4 (2N) - exp (-2 ем) 


next: sketch of proof | 
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Theory of Generalization A Pictorial Proof 


Step 1: Replace Eou by Eh 





P [ah € H s.t. |Ein(h) – Eou(h)| > 1 
3h € H s. |Ein(h) — EL(h)| > a 








e Ejn(h) finitely many, Есш(ћ) infinitely many 
—replace the evil Eout first 


e how? sample verification set D’ of size N 
to calculate Ej. 


• BAD h of En — Eout 
Pree” BAD h of Ein — Ef, 


evil Eout removed by 
verification with ‘ghost data’ | 





Probability distribution 
of Ein, Ef, 








Theory of Generalization A Pictorial Proof 


Step 2: Decompose H by Kind 
BAD 


ІЛ 


әр|зһе H s.t. |Ein(h) – Е/(һ)| > 5 | 





гъ 


2m4 (2N)P [fixed hst |Ein(h) – EL(h)| > 5 








• Ein with D, Е/ with D’ 





























—now та comes to play % 

• how? infinite becomes e of 
[ROS Xu хх || 
kinds E. 07048 

e union bound on т,(2М) kinds | „сањао (о vro Bone (леу 





use m4 (2N) to calculate BAD-overlap properly 
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Theory of Generalization A Pictorial Proof 


Step 3: Use Hoeffding without Replacement 


BAD 


ІЛ 


2m4 (2N)P [fixed hs. |En(h) – EL(h)| > У 


2my(2N) - 2 exp (-2 (5 | 





• consider bin of 2N examples, 
choose N for En, leave others for Е’, 


€ Ein ЊЕ 

[Ein — Ер] > $ € [Ein – — 

• so? just ‘smaller bin’, ‘smaller е, and 
Hoeffding without replacement 


sample for Ein 





E 
> 4 





small bin 


use Hoeffding after zooming to fixed h | 
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Theory of Generalization A Pictorial Proof 


That's All! 
Vapnik-Chervonenkis (VC) bound: 


P| 3h € 4 st. |Ein(h) – Eou(h)| > e] 


< 4m,(2N) exp (-ем) 





e replace Eout by Ej, 
е decompose 71 by kind 
e use Hoeffding without replacement 





2D perceptrons: 
e break point? 4 
• ma(N)? O(N9) 
learning with 2D perceptrons feasible! :-) 







Theory of Generalization A Pictorial Proof 

Fun Time 
For positive rays, m,(N) = N + 1. Plug it into the VC bound for 
€ = 0.1 and N = 10000. What is VC bound of BAD events? 
















(аһ € H s-t. |Ein(h) – Бол(В)| > 1 < 4ту(2М) exp (-г4м) 


© 2.77 х 10-8” 
О 5.54 x 10-8? 
Ө 2.98 x 107! 

Ф 2.29 x 10? 





Reference Answer: Ө 


Simple calculation. Note that the BAD 
probability bound is not very small even with 
10000 examples. 
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Theory of Generalization A Pictorial Proof 
Summary 
© When Can Machines Learn? 
Ө Why Can Machines Learn? 


Lecture 5: Training versus Testing 





Lecture 6: Theory of Generalization 


e Restriction of Break Point 
break point ‘breaks’ consequent points 
e Bounding Function: Basic Cases 
В(М, к) bounds ту (Л) with break point К 
ә Bounding Function: Inductive Cases 
B(N, К) is poly(N) 










ә A Pictorial Proof 
m,(N) can replace М with a few changes 





e next: how to ‘use’ the break point? 


Ө How Can Machines Learn? 
О How Can Machines Learn Better? 
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